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In many social dilemmas, individuals tend to generate a situation with low payoffs instead 
00 , of a system optimum ("tragedy of the commons"). Is the routing of traffic a similar 

problem? In order to address this question, we present experimental results on humans 
playing a route choice game in a computer laboratory, which allow one to study decision 
C ~ ~) ' behavior in repeated games beyond the Prisoner's Dilemma. We will focus on whether 

I/"") i individuals manage to find a cooperative and fair solution compatible with the system- 

C "3 , optimal road usage. We find that individuals tend towards a user equilibrium with equal 

■ travel times in the beginning. However, after many iterations, they often establish a 

' coherent oscillatory behavior, as taking turns performs better than applying pure or 

mixed strategies. The resulting behavior is fair and compatible with system-optimal 
road usage. In spite of the complex dynamics leading to coordinated oscillations, we 
have identified mathematical relationships quantifying the observed transition process. 
Our main experimental discoveries for 2- and 4-person games can be explained with a 
novel reinforcement learning model for an arbitrary number of persons, which is based 
on past experience and trial-and-error behavior. Gains in the average payoff seem to be 
an important driving force for the innovation of time-dependent response patterns, i.e. 
' the evolution of more complex strategies. Our findings are relevant for decision support 

systems and routing in traffic or data networks. 

" 



Keywords: Game theory; reinforcement learning; multi-agent simulation. 



1. Introduction 

Congestion is a burden of today's traffic systems, affecting the economic prosper- 
ity of modern societies. Yet, the optimal distribution of vehicles over alternative 
routes is still a challenging problem and uses scarce resources (street capacity) in 
an inefficient way. Route choice is based on interactive, but decentralized individual 
decisions, which cannot be well described by classical utility-based decision mod- 
els [27]. Similar to the minority game [16,39,43], it is reasonable for different people 
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to react to the same situation or information in different ways. As a consequence, 
individuals tend to develop characteristic response patterns or roles [26] . Thanks to 
this differentiation process, individuals learn to coordinate better in the course of 
time. However, according to current knowledge, selfish routing docs not establish 
the system optimum of minimum overall travel times. It rather tends to estab- 
lish the Wardrop equilibrium, a special user or Nash equilibrium characterized by 
equal travel times on all alternative routes chosen from a certain origin to a given 
destination (while routes with longer travel times are not taken) [71]. 

Since Pigou [53] , it has been suggested to resolve the problem of inefficient road 
usage by congestion charges, but are they needed? Is the missing establishment of 
a sytem optimum just a problem of varying traffic conditions and changing origin- 
destination pairs, which make route-choice decisions comparable to one-shot games? 
Or would individuals in an iterated setting of a day-to-day route choice game with 
identical conditions spontaneously establish cooperation in order to increase their 
returns, as the folk theorem suggests [6]? 

How would such a cooperation look like? Taking turns could be a suitable so- 
lution [62]. While simple symmetrical cooperation is typically found for the re- 
peated Prisoner's Dilemma [2,3,44-46,49,52,55,59,64,67,69], emergent alternating 
reciprocity has been recently discovered for the games Leader and Battle of the 
Sexes [11]. a Note that such coherent oscillations are a time-dependent, but de- 
terministic form of individual decision behavior, which can establish a persistent 
phase-coordination, while mixed strategies, i.e. statistically varying decisions, can 
establish cooperation only by chance or in the statistical average. This difference is 
particularly important when the number of interacting persons is small, as in the 
particular route choice game discussed below. 

Note that oscillatory behavior has been found in iterated games before: 

• In the rock-paper-scissors game [67], cycles are predicted by the game- 
dynamical equations due to unstable stationary solutions [28] . 

• Oscillations can also result by coordination problems [1,29,31,33], at the 
cost of reduced system performance. 

• Moreover, blinker strategies may survive in repeated games played by a 
mixture of finite automata [5] or result through evolutionary strategies 
[11,15,16,38,39,42,43,74]. 

However, these oscillation-generating mechanisms are clearly to be distinguished 
from the establishment of phase-coordinated alternating reciprocity we are inter- 
ested in (coherent oscillatory cooperation to reach the system optimum) . 

Our paper is organized as follows: In Section 2, we will formally introduce the 
route choice game for N players, including issues like the Wardrop equilibrium [71] 
and the Braess paradox [10]. Section 3 will focus on the special case of the 2-person 
route choice game, compare it with the minority game [1,15,16,38,39,42,43,74], 

a See Fig. 2 for a specification of these games. 
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and discuss its place in the classification scheme of symmetrical 2x2 games. This 
section will also reveal some apparent shortcomings of the previous game-theoretical 
literature: 

• While it is commonly stated that among the 12 ordinally distinct, symmet- 
rical 2x2 games [11,57] only 4 archetypical 2x2 games describe a strategical 
conflict (the Prisoner's Dilemma, the Battle of the Sexes, Chicken, and 
Leader) [11, 18, 56], we will show that, for specific payoffs, the route choice 
game (besides Deadlock) also represents an interesting strategical conflict, 
at least for iterated games. 

• The conclusion that conservative driver behavior is best, i.e. it does not pay 
off to change routes [7,65,66], is restricted to the special case of route-choice 
games with a system-optimal user equilibrium. 

• It is only half the truth that cooperation in the iterated Prisoner's Dilemma 
is characterized by symmetrical behavior [11]. Phase-coordinated asym- 
metric reciprocity is possible as well, as in some other symmetrical 2x2 
games [11]. 

New perspectives arise by less restricted specifications of the payoff values. 

In section 4, we will discuss empirical results of laboratory experiments with hu- 
mans [12, 18,32]. According to these, reaching a phase-coordinated alternating state 
is only one problem. Exploratory behavior and suitable punishment strategies are 
important to establish asymmetric oscillatory reciprocity as well [11,20]. Moreover, 
we will discuss several coefficients characterizing individual behavior and chances for 
the establishment of cooperation. In section 5, we will present multi-agent computer 
simulations of our observations, based on a novel win-stay, lose-shift [50, 54] strat- 
egy, which is a special kind of reinforcement learning strategy [40] . This approach is 
based on individual historical experience [13] and, thereby, clearly differs from the 
selection of the best-performing strategy in a set of hypothetical strategies as as- 
sumed in studies based on evolutionary or genetical algorithms [5,11,15,16,39,42,43]. 
The final section will summarize our results and discuss their relevance for game 
theory and possible applications such as data routing algorithms [35,72], advanced 
driver information systems [8,14,30,37,41,63,70,73], or road pricing [53]. 

2. The Route Choice Game 

In the following, we will investigate a scenario with two alternative routes between a 
certain origin and a given destination, say, between two places or towns A and B (sec 
Fig. 1). We are interested in the case where both routes have different capacities, 
say a freeway and a subordinate or side road. While the freeway is faster when it is 
empty, it may be reasonable to use the side road when the freeway is congested. 

The "success" of taking route i could be measured in terms of its inverse travel 
time \/Ti(Ni) — Vi(Ni) / Li, where Li is the length of route i and Vi(Ni) the av- 
erage velocity when Ni of the N drivers have selected route i. One may roughly 
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Destination 



Fig. 1. Illustration of the investigated day-to-day route choice scenario. We study the dynamic 
decision behavior in a repeated route choice game, where a given destination can be reached from 
a given origin via two different routes, a freeway (route 1) and a side road (route 2). 



approximate the average vehicle speed Vi on route i by the linear relationship [24] 

Ni(t) 



N" 



(1) 



where V® denotes the maximum velocity (speed limit) and 7Vf lax the capacity, 
i.e. the maximum possible number of vehicles on route i. With Ai = V® /Li and 
Bi = V- ) /(N- nax Li), the inverse travel time then obeys the relationship 

l/T(Ni) = Ai - B t N, , (2) 

which is linearly decreasing with the road occupancy iVj. Other monotonously falling 
relationships Vi(Ni) would make the expression for the inverse travel times non- 
linear, but they would probably not lead to qualitatively different conclusions. 
The user equilibrium of equal travel times is found for a fraction 

N B l + B 2 N B l + B 2 ( ' 

of persons choosing route 1. In contrast, the system optimum corresponds to the 
maximum of the overall inverse travel times Ni/Ti(Ni) + N2/T2(N 2 ) and is found 
for the fraction 

Nl = B 2 1 A x -A 2 

N Bi+ B 2 2N B l + B 2 U 

of 1-dccisions. The difference between both fractions vanishes in the limit TV — » 
co. Therefore, only experiments with a few players allow to find out, whether the 
test persons adapt to the user equilibrium or to the system optimum. We will see 
that both cases have completely different dynamical implications: While the most 
successful strategy to establish the user equilibrium is to stick to the same decision 
in subsequent iterations [27,65,66], the system optimum can only be reached by a 
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time-dependent strategy (at least, if no participant is ready to pay for the profits 
of others). 

Note that alternative routes can reach comparable travel times only when the 
total number TV of vehicles is large enough to fulfil the relationships Pi(N) < 
P 2 (0) = A 2 and P 2 (N) < Pi (0) = A\ . Our route choice game will address this traffic 
regime and additionally assume N < A^ max . The case iVj = jV™ ax corresponds to a 
complete gridlock on route i. 

Finally, it may be interesting to connect the previous quantities with the vehicle 
densities pi and the traffic flows Qf. If route i consists of h lanes, the relation with 
the average vehicle density is pi(Ni) = A^/(7jij), and the relation with the traffic 
flow is Qi(Ni) = Pl V l {N l ) = Ni/lhTiiNi)). 

In the following, we will linearly transform the inverse travel time 1/7$ (iVi) in 
order to define the so-called payoff 

P l (N l ) = C t - DM (5) 

for choosing route i. The payoff parameters Cj and Di depend on the parameters 
Ai, P>i, and N, but will be taken constant. We have scaled the parameters so that 
we have the payoff Pi(Nf) = (zero payoff points) in the user equilibrium and the 
payoff TViPi (JVf ) + N 2 P 2 (N - N%) = lOOiV (an average of 100 payoff points) in the 
system optimum. This serves to reach generalizable results and to provide a better 
orientation to the test persons. 

Note that the investigation of social (multi-person) games with linearly falling 
payoffs is not new [33]. For example, Schelling [62] has discussed situations with 
"conditional externality" , where the outcome of a decision depends on the indepen- 
dent decisions of potentially many others [62]. Pigou has addressed this problem, 
which has been recently focused on by Schreckenberg and Selten's project SUR- 
VIVE [7,65,66] and others [8,41,58]. 

The route choice game is a special congestion game [22,47,60]. More precisely 
speaking, it is a multi-stage symmetrical TV-person single commodity congestion 
game [68]. Congestion games belong to the class of "potential games" [48], for 
which many theorems are available. For example, it is known that there always 
exists a Wardrop equilibrium [71] with essentially unique Nash flows [4]. This is 
characterized by the property that no individual driver can decrease his or her 
travel time by a different route choice. If there are several alternative routes from a 
given origin to a given destination, the travel times on all used alternative routes in 
the Wardrop equilibrium is the same, while roads with longer travel times are not 
used. However, the Wardrop equilibrium as expected outcome of selfish routing does 
not generally reach the system optimum, i.e. minimize the total travel times. Nash 
flows are often inefficient, and selfish behavior implies the possibility of decreased 
network performance. 15 This is particularly pronounced for the Braess paradox [10, 
61], according to which additional streets may sometimes increase the overall travel 

b For more details see the work by T. Roughgarden. 



2, 2008 5:30 WSPC/INSTRUCTION FILE 



acs' final 



6 D. Helbing, M. Schdnhof, H.-U. Stark, and J. A. Holyst 

time and reduce the throughput of a road network. The reason for this is the possible 
existence of badly performing Nash equilibria, in which no single person can improve 
his or her payoff by changing the decision behavior. 

In fact, recent laboratory experiments indicate that, in a "day-to-day route 
choice scenario" based on selfish routing, the distribution of individuals over the 
alternative routes is fluctuating around the Wardrop equilibrium [27,63]. Additional 
conclusions from the laboratory experiments by Schreckenberg, Selten et al. are as 
follows [65,66]: 

• Most people, who change their decision frequently, respond to their expe- 
rience on the previous day (i.e. in the last iteration). 

• There are only a few different behavioral patterns: direct responders (44%), 
contrarian responders (14%), and conservative persons, who do not respond 
to the previous outcome. 

• It does not pay off to react to travel time information in a sensitive way, 
as conservative test persons reach the smallest travel times (the largest 
payoffs) on average. 

• People's reactions to short term travel forecasts can invalidate these. Nev- 
ertheless, travel time information helps to match the Wardrop equilibrium, 
so that excess travel times due to coordination problems are reduced. 

A closer experimental analysis based on longer time series (i.e. more iterations) for 
smaller groups of test persons reveals a more detailed picture [26] : 

• Individuals do not only show an adaptive behavior to the travel times on 
the previous day, but also change their response pattern in time [26,34]. 

• In the course of time, one finds a differentiation process which leads to the 
development of characteristic, individual response patterns, which tend to 
be almost deterministic (in contrast to mixed strategies). 

• While some test persons respond to small differences in travel times, oth- 
ers only react to medium-sized deviations, further people respond to large 
deviations, etc. In this way, overreactions of the group to deviations from 
the Wardrop equilibrium are considerably reduced. 

Note that the differentiation of individual behaviors is a way to resolve the coor- 
dination problem to match the Wardrop equilibrium exactly, i.e. which participant 
should change his or her decision in the next iteration in order to compensate for 
a deviation from it. This implies that the fractions of specific behavioral response 
patterns should depend on the parameters of the payoff function. A certain frac- 
tion of "stayers" , who do not respond to travel time information, can improve the 
coordination in the group, i.e. the overall performance. However, stayers can also 
prevent the establishment of a system optimum, if alternating reciprocity is needed, 
see Eq. (14). 
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3. Classification of Symmetrical 2x2 Games 

In contrast to previous laboratory experiments, we have studied the route choice 
game not only with a very high number of repetitions, but also with a small number 
iV E {2,4} of test persons, in order to see whether the system optimum or the 
Wardrop equilibrium is established. Therefore, let us shortly discuss how the 2- 
person game relates to previous game-theoretical studies. 



Notation: 

Strategy 1 
Strategy 2 
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Fig. 2. Classification of symmetrical 2x2 games according to their payoffs Pij. Two payoff values 
have been kept constant as payoffs may be linearly transformed and the two strategies of the 
one-shot game renumbered. Our choice of ?u = and P22 = —200 was made to define a payoff 
of points in the user equilibrium and an average payoff of 100 in the system optimum of our 
investigated route choice game with P12 = 300 and P21 = —100. 



Iterated symmetrical two-person games have been intensively studied [12,18], 
including Stag Hunt, the Battle of the Sexes, or the Chicken Game (see Fig. 2). 
They can all be represented by a payoff matrix of the form P = (Pij), where Pij 
is the success ("payoff") of person 1 in a one-shot game when choosing strategy 
i E {1,2} and meeting strategy j E {1,2}. The respective payoffs of the second 
person are given by the symmetrical values Pji. Figure 2 shows a systematics of the 
previously mentioned and other kinds of symmetrical two-person games [21]. The 
relations 

P21 > Pu > P22 > P12 , (6) 

for example, define a Prisoner's Dilemma. In this paper, however, we will mainly 
focus on the 2-person route choice game defined by the conditions 

P12 > Pu > P21 > P22 (7) 

(see Fig. 3). Despite some common properties, this game differs from the minority 
game [16,39,43] or El Farol bar problem [1] with Pn, P21 > Pu, P22, as a minority 
decision for alternative 2 is less profitable than a majority decision for alternative 
1. Although oscillatory behavior has been found in the minority game as well [9,15, 
16,36,43], an interesting feature of the route choice experiments discussed in the 
following is the regularity and phase-coordination (coherence) of the oscillations. 
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Fig. 3. Payoff specifications of the symmetrical 2x2 games investigated in this paper, a) General 
payoff matrix underlying the classification scheme of Fig. 2. b), c) Two variants of the Prisoner's 
Dilemma, d) Route choice game with a strategical conflict between the user equilibrium and the 
system optimum. 

The 2-person route choice game fits well into the classification scheme of sym- 
metrical 2x2 games. In Rapoport and Guyer's taxonomy of 2x2 games [57], the 2- 
person route choice game appears on page 211 as game number 7 together with four 
other games with strongly stable equilibria. Since then, the game has almost been 
forgotten and did not have a commonly known interpretation or name. Therefore, 
we suggest to name it the 2-person "route choice game" . Its place in the extended 
Eriksson-Lindgrcn scheme of symmetrical 2x2 games is graphically illustrated in 
Fig. 2. 

According to the game-theoretical literature, there are 12 ordinally distinct, 
symmetric 2x2 games [57], but after excluding strategically trivial games in the sense 
of having equilibrium points that are uniquely Pareto-efficient, there remain four 
archetypical 2x2 games: the Prisoner's Dilemma, the Battle of the Sexes, Chicken 
(Hawk-Dove), and Leader [56]. However, this conclusion is only correct, if the four 
payoff values Py are specified by the four values {1,2,3,4}. Taking different values 
would lead to a different conclusion: If we name subscripts so that Ph > P 22 , 
a strategical conflict between a user equilibrium and the system optimum results 
when 

P12 + P21 > 2Pn . (8) 

Our conjecture is that players tend to develop alternating forms of reciprocity if this 
condition is fulfilled, while symmetric reciprocity is found otherwise. This has the 
following implications (see Fig. 2): 

• If the 2x2 games Stag Hunt, Harmony, or Pure Coordination are repeated 
frequently enough, we expect always a symmetrical form of cooperation. 

• For Leader and the Battle of the Sexes, we expect the establishment of 
asymmetric reciprocity, as has been found by Browning and Colman with a 
computer simulation based on a genetic algorithm incorporating mutation 
and crossing-over [11]. 

• For the games Route Choice, Deadlock, Chicken, and Prisoner's Dilemma 
both, symmetric (simultaneous) and asymmetric (alternating) forms of co- 
operation are possible, depending on whether condition (8) is fulfilled or 
not. Note that this condition cannot be met for some games, if one restricts 
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to ordinal payoff values Py 6 {1,2,3,4} only. Therefore, this interesting 
problem has been largely neglected in the past (with a few exceptions, 
e.g. [51]). In particular, convincing experimental evidence of alternating 
reciprocity is missing. The following sections of this paper will, therefore, 
not only propose a simulation model, but also focus on an experimental 
study of this problem, which promises interesting new results. 



4. Experimental Results 



20000 




Fig. 4. Representative example for the emergence of coherent oscillations in a 2-pcrson route choice 
experiment with the parameters specified in Fig. 3d. Top left: Decisions of both participants over 
300 iterations. Bottom left: Number Ni(t) of 1-decisions over time t. Note that TVi = 1 corresponds 
to the system optimum, while Ni = 2 corresponds to the user equilibrium of the one-shot game. 
Right: Cumulative payoff of both players in the course of time t (i.e. as a function of the number 
of iterations). Once the coherent oscillatory cooperation is established (t > 220), both individuals 
have high payoff gains on average. 



Altogether we have carried out more than 80 route choice experiments with 
different experimental setups, all with different participants. In the 24 two-person 
[12 four-person] experiments evaluated here (see Figs. 4-15), test persons were in- 
structed to choose between two possible routes between the same origin and des- 
tination. They knew that route 1 corresponds to a 'freeway' (which may be fast 
or congested), while route 2 represents an alternative route (a 'side road'). Test 
persons were also informed that, if two [three] participants would choose route 1, 
everyone would receive points, while if half of the participants would choose route 
1, they would receive the maximum average amount of 100 points, but 1-choosers 
would profit at the cost of 2-choosers. Finally, participants were told that everyone 
could reach an average of 100 points per round with variable, situation-dependent 
decisions, and that the (additional) individual payment after the experiment would 
depend on their cumulative payoff points reached in at least 300 rounds (100 points 
= 0.01 EUR). 

Let us first focus on the two-person route-choice game with the payoffs Pn = 
Pi (2) = 0, Pi 2 = Pi(l) = 300, P 2 i = P 2 (l) = -100, and P 22 = P 2 (2) = -200 
(see Fig. 3d), corresponding to C x = 600, Di = 300, C 2 = 0, and D 2 = 100. For 
this choice of parameters, the best individual payoff in each iteration is obtained by 
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Fig. 5. Representative example for a 2-person route choice experiment, in which no alternating 
cooperation was established. Due to the small changing frequency of participant 1, there were not 
enough cooperative episodes that could have initiated coherent oscillations. Top left: Decisions of 
both participants over 300 iterations. Bottom left: Number N\(t) of 1-dccisions over time t. Right: 
The cumulative payoff of both players in the course of time t shows that the individual with the 
smaller changing frequency has higher profits. 



choosing route 1 (the "freeway") and have the co-player(s) choose route 2. Choosing 
route 1 is the dominant strategy of the one-shot game, and players are tempted to 
use it. This produces an initial tendency towards the "strongly stable" user equilib- 
rium [57] with points for everyone. However, this decision behavior is not Pareto 
efficient in the repeated game. Therefore, after many iterations, the players often 
learn to establish the Pareto optimum of the multi-stage supergame by selecting 
route 1 in turns (see Fig. 4). As a consequence, the experimental payoff distribu- 
tion shows a maximum close to points in the beginning and a peak at 100 points 
after many iterations (see Fig. 6), which clearly confirms that the choice behavior 
of test persons tends to change over time. Nevertheless, in 7 out of 24 two-person 
experiments, persistent cooperation did not emerge during the experiment. Later 
on, we will identify reasons for this. 



-100 -50 



50 100 150 200 



Average Payoff per Iteration (Iterations 1-50) 




100 150 200 250 300 
Average Payoff per Iteration (Iterations 250-300) 



Fig. 6. Frequency distributions of the average payoffs of the 48 players participating in our 24 
two-person route choice experiments. Left: Distribution during the first 50 iterations. Right: Dis- 
tribution between iterations 250 and 300. The initial distribution with a maximum close to points 
(left) indicates a tendency towards the user equilibrium corresponding to the dominant strategy of 
the one-shot game. However, after many iterations, many individuals learn to establish the system 
optimum with a payoff of 100 points (right). 
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4.1. Emergence of cooperation and punishment 
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Fig. 7. Representative example for a 2-person route choice experiment, in which participant 1 leaves 
the pattern of oscillatory cooperation temporarily in order to make additional profits. Note that 
participant 2 docs not "punish" this selfish behavior, but continues to take routes in an alternating 
way. Top left: Decisions of both participants over 300 iterations. Bottom left: Number Ni(t) of 
1-dccisions over time t. Right: Cumulative payoff of both players as a function of the number of 
iterations. The different slopes indicate an unfair outcome despite of high average payoffs of both 
players. 



In order to reach the system optimum of (—100 + 300)/2 = 100 points per it- 
eration, one individual has to leave the freeway for one iteration, which yields a 
reduced payoff of -100 in favour of a high payoff of +300 for the other individual. 
To be profitable also for the first individual, the other one should reciprocate this 
"offer" by switching to route 2, while the first individual returns to route 1. Estab- 
lishing this oscillatory cooperative behavior yields 100 extra points on average. If 
the other individual is not cooperative, both will be back to the user equilibrium of 
points only, and the uncooperative individual has temporarily profited from the 
offer by the other individual. This makes "offers" for cooperation and, therefore, 
the establishment of the system optimum unlikely. 

Hence, the innovation of oscillatory behavior requires intentional or random 
changes ( "trial-and-error behavior"). Moreover, the consideration of multi-period 
decisions is helpful. Instead of just 2 one-stage (i.e. one-period) alternative deci- 
sions 1 and 2, there are 2™ different n-stage (n-period) decisions. Such multi-stage 
strategies can be used to define higher-order games and particular kinds of su- 
pergame strategies. In the two-person 2nd-order route choice game, for example, an 
encounter of the two-stage decision 12 with 21 establishes the system optimum and 
yields equal payoffs for everyone (see Fig. 8). Such an optimal and fair solution is 
not possible for one-stage decisions. Yet, the encounter of 12 with 21 ("cooperative 
episode") is not a Nash equilibrium of the two-stage game, as an individual can 
increase his or her own payoff by selecting 11 (see Fig. 8). Probably for this reason, 
the first cooperative episodes in a repeated route choice game (i.e. encounters of 
12-decisions with 21-decisions in two subsequent iterations) do often not persist (see 
Fig. 9). Another possible reason is that cooperative episodes may be overlooked. 
This problem, however, can be reduced by a feedback signal that indicates when 
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Fig. 8. Illustration of the concept of higher-order games defined by ra-stagc strategics. Left: Payoff 
matrix P = (Pij) of the one-shot 2x2 route choice game. Right: Payoff matrix (P(f^i 2 f (j 1 j 2 )) = 
(-'"nil ^^232) °f t nc 2nd-ordcr route choice game defined by 2-stage decisions (right). The analysis 
of the one-shot game (left) predicts that the user equilibrium (with both persons choosing route 
1) will establish and that no single player could increase the payoff by another decision. For two- 
period decisions (right), the system optimum (strategy 12 meeting strategy 21) corresponds to 
a fair solution, but one person can increase the payoff at the cost of the other (see arrow 1), if 
the game is repeated. A change of the other person's decision can reduce losses and punish this 
egoistic behavior (arrow 2), which is likely to establish the user equilibrium with payoff 0. In order 
to leave this state again in favour of the system optimum, one person will have to make an "offer" 
at the cost of a reduced payoff (arrow 3). This offer may be due to a random or intentional change 
of decision. If the other person reciprocates the offer (arrow 4), the system optimum is established 
again. The time-averaged payoff of this cycle lies below the system optimum. 



the system optimum has been reached. For example, we have experimented with a 
green background color. In this setup, a cooperative episode could be recognized by 
a green background that appeared in two successive iterations together with two 
different payoff values. 

The strategy of taking route 1 does not only dominate on the first day (in the 
first iteration). Even if a cooperative oscillatory behavior has been established, there 




Fig. 9. Cumulative distribution of required cooperative episodes until persistent cooperation was 
established, given that cooperation occured during the duration of the game as in 17 out of 24 
two-person experiments. The experimental data arc well approximated by the logistic curve (9) 
with the fit parameters C2 = 3.4 and d^ = 0.17. 
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is a temptation to leave this state, i.e. to choose route 1 several times, as this yields 
more than 100 points on average for the uncooperative individual at the cost of the 
participant continuing an alternating choice behavior (see Figs. 7 and 8). That is, 
the conditional changing probability pi ( 2 1 1 , 7V"i = l;t) of individuals I from route 

1 to route 2, when the system optimum in the previous iteration was established 
(i.e. Ni = 1) tends to be small initially. However, oscillatory cooperation of period 

2 needs p;(2|l,iVi = l;i) = 1. The required transition in the decision behavior 
can actually be observed in our experimental data (see Fig. 10, left). With this 
transition, the average frequency of 1-decisions goes down to 1/2 (see Fig. 10, right). 
Note, however, that alternating reciprocity does not necessarily require oscillations 
of period 2. Longer periods are possible as well (see Fig. 11), but have occured only 
in a few cases (namely, 3 out of 24 cases). 




Iteration t Iteration t 

Fig. 10. Left: Conditional changing probability pi(2|l,7Vi = l;t) of person I from route 1 (the 
"freeway") to route 2, when the other person has chosen route 2, averaged over a time window 
of 50 iterations. The transition from initially small values to 1 (for t > 240) is characteristic and 
illustrates the learning of cooperative behavior. In this particular group (cf. Fig. 4) the values 
started even at zero, after a transient time period of t < 60. Right: Proportion P;(l,t) of 1- 
decisions of both participants I in the two-person route choice experiment displayed in Fig. 4. 
While the initial proportion is often close to 1 (the user equilibrium), it reaches the value 1/2 
when persistent oscillatory cooperation (the system optimum) is established. 

How does the transition to oscillatory cooperation come about? The establish- 
ment of alternating reciprocity can be supported by a suitable punishment strategy: 
If the other player should have selected route 2, but has chosen route 1 instead, he 
or she can be punished by changing to route 1 as well, since this causes an average 
payoff of less than 100 points for the other person (see Fig. 8). Repeated punishment 
of uncooperative behavior can, therefore, reinforce cooperative oscillatory behavior. 
However, the establishment of oscillations also requires costly "offers" by switch- 
ing to route 2, which only pay back in case of alternating reciprocity. It does not 
matter whether these "offers" are intentional or due to exploratory trial-and-error 
behavior. 

Due to punishment strategies and similar reasons, persistent cooperation is often 
established after a number n of cooperative episodes. In the 17 of our 24 two- 
person experiments, in which persistent cooperation was established, the cumulative 
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Fig. 11. Representative example for a 2-person route choice experiment with phase-coordinated 
oscillations of long (and varying) time periods larger than 2. Top left: Decisions of both participants 
over 300 iterations. Bottom left: Number N\(t) of 1-dccisions over time t. Right: Cumulative 
payoff of both players as a function of the number of iterations. The sawtooth-like increase in the 
cumulative payoff indicates gains by phase-coordinated alternations with long oscillation periods. 

distribution of required cooperative episodes could be mathematically described by 
the logistic curve 

F N (n) = 1/[1 + c N cxp(-ri w n)] (9) 

(see Fig. 9). Note that, while we expect that this relationship is generally valid, the 
fit parameters cat and g?at may depend on factors like the distribution of participant 
intelligence, as oscillatory behavior is apparently difficult to establish (see below). 



4.2. Preconditions for cooperation 

Let us focus on the time period before persistent oscillatory cooperation is estab- 
lished and denote the occurence probability that individual I chooses alternative 
i G {1,2} by Pi{i). The quantity pi(j\i) shall represent the conditional probability 
of choosing j in the next iteration, if i was chosen by person I in the present one. 
Assuming stationarity for reasons of simplicity, we expect the relationship 

p,(2|l)fl(l)=pj(l|2)fl(2), (10) 

i.e. the (unconditional) occurence probability p(l, 2) = p/(2|l)p(l) of having alter- 
native 1 in one iteration and 2 in the next agrees with the joint occurence probability 
P; (2, 1) = pi(l|2)p(2) of finding the opposite sequence 21 of decisions: 

P(l,2)=Pi(2,l). (11) 

Moreover, if rj denotes the average changing frequency of person / until persistent 
cooperation is established, we have the relation 

n = p(i,2) + p(2,l). (12) 

Therefore, the probability that all N players simultaneously change their decision 
from one iteration to the next is J\i=i r i- Note that there are 2 N such realizations 
of N decision changes 12 or 21, which have all the same occurence probability 
because of Eqn. (11). Among these, only the ones where N/2 players change from 
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i 
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1 to 2 and the other N/2 participants change from 2 to 1 establish cooperative 
episodes, given that the system optimum corresponds to an equal distribution over 
both alternatives. Considering that the number of different possibilities of selecting 
N/2 out of N persons is given by the binomial coefficient, the occurence probability 
of cooperative events is 

/ \ n 

^=^(4 2 )rh as) 

(at least in the ensemble average). Since the expected time period T until the 
cooperative state incidentally occurs equals the inverse of P c , we finally find the 
formula 

1=1 

This formula is well confirmed by our 2-person experiments (see Fig. 12). It gives 
the lower bound for the expected value of the minimum number of required iter- 
ations until persistent cooperation can spontaneously emerge (if already the first 
cooperative episode is continued forever). 
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Fig. 12. Comparison of the required number of cooperative episodes y with the expected number 
x of cooperative episodes (approximated as occurence time of persistent cooperation, divided by 
the expected time interval T until a cooperative episode occurs by chance). Note that the data 
points support the relationship y = x and, thereby, formula (14). 



Obviously, the occurence of oscillatory cooperation is expected to take much 
longer for a large number N of participants. This tendency is confirmed by our 4- 
person experiments compared to our 2-person experiments. It is also in agreement 
with intuition, as coordination of more people is more difficult. (Note that mean 
first passage or transition times in statistical phyisics tend to grow exponentially in 
the number N of particles as well.) 

Besides the number N of participants, another critical factor for the cooperation 
probability are the changing frequencies r;: They are needed for the exploration of 
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innovative strategies, coordination and cooperation. Although the instruction of 
test persons would have allowed them to conclude that taking turns would be a 
good strategy, the changing frequencies r; of some individuals was so small that 
cooperation within the duration of the respective experiment did not occur, in 
accordance with formula (14). The unwillingness of some individuals to vary their 
decisions is sometimes called "conservative" [7,65,66] or "inertial behavior" [9]. Note 
that, if a player never reciprocates "offers" by other players, this may discourage 
further "offers" and reduce the changing frequency of the other player(s) as well 
(see the decisions 50 through 150 of player 2 in Fig. 4). 

Our experimental time series show that most individuals initially did not know a 
periodic decision behavior would allow them to establish the system optimum. This 
indicates that the required depth of strategic reasoning [19] and the related com- 
plexity of the game for an average person are already quite high, so that intelligence 
may matter. Compared to control experiments, the hint that the maximum average 
payoff of 100 points per round could be reached "by variable, situation-dependent 
decisions" , increased the average changing frequency (by 75 percent) and with this 
the occurence frequency of cooperative events. Thereby, it also increased the chance 
that persistent cooperation established during the duration of the experiment. 

Note that successful cooperation requires not only coordination [9] , but also in- 
novation: In their first route choice game, most test persons discover the oscillatory 
cooperation strategy only by chance in accordance with formula (14). The chang- 
ing frequency is, therefore, critical for the establishment of innovative strategies: 
It determines the exploratory trial-and-error behavior. In contrast, cooperation is 
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Fig. 13. Experimentally observed decision behavior when two groups involved in two-person route 
choice experiments afterwards played a four-person game with C\ = 900, D\ = 300, C'2 = 100, 
D2 = 100. Left: While oscillations of period 2 emerged in the second group (bottom), another 
alternating pattern corresponding to ra-period decisions with n > 2 emerged in the first group 
(top). Right: After all persons had learnt oscillatory cooperative behavior, the four-person game 
just required coordination, but not the invention of a cooperative strategy. Therefore, persistent 
cooperation was quickly established (in contrast to four-person experiments with new participants). 
It is clearly visible that the test persons continued to apply similar decision strategies (right) as 
in the previous two-person experiments (left). 
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easy when test persons know that the oscillatory strategy is successful: When two 
teams, who had successfully cooperated in 2-person games, had afterwards to play 
a 4-person game, cooperation was always and quickly established (see Fig. 13). 
In contrast, unexperienced co-players suppressed the establishment of oscillatory 
cooperation in 4-person route choice games. 



4.3. Strategy coefficients 

In order to characterize the strategic behavior of individuals and predict their 
chances of cooperation, we have introduced some strategy coefficients. For this, 
let us introduce the following quantities, which are determined from the iterations 
before persistent cooperation is established: 

• cf = relative frequency of a changed subsequent decision of individual I if 
the payoff was negative (k = — ), zero (k = 0), or positive (k = +). 

• sf = relative frequency of individual I to stay with the previous decision if 
the payoff was negative (k — — ), zero (k = 0), or positive (k = +). 

The Yule-coefficient 

Q l = \ s {- c \\ (15) 
c i s i + c i s i 

with — 1 < Qi < 1 was used by Schreckenberg, Selten et al. [65] to identify direct 
responders with 0.5 < Qi « 1 (who change their decision after a negative payoff 
and stay after a positive payoff), and contrarian responders with —0.5 > Qi « — 1 
(who change their decision after a positive payoff and stay after a negative one). A 
random decision behavior would correspond to a value Qi « 0. However, a problem 
arises if one of the variables c7 , st, ct, or assumes the value 0. Then, we have 
Qi e { — 1,1}, independently of the other three values. If two of the variables become 
zero, Qi is sometimes even undefined. Moreover, if the values are small, the resulting 
conclusion is not reliable. Therefore, we prefer to use the percentage difference 

c i +4 C T + S T 

for the assessment of strategies. Again, we have — 1 < Si < 1. Direct responders 
correspond to Si > 0.25 and contrarian responders to Si < —0.25. For —0.25 < 
Si < 0.25, the response to the previous payoff is rather random. 
In addition, we have introduced the Z-coefficicnt 

Zi = . (17) 

c i + b i 

for which we have < Zi < 1. This coefficient describes the likely response of 
individual I to the user equilibrium. Zi = means that individual I does not change 
routes, if the user equilibrium was reached. Zi = 1 implies that person I always 
changes, while Zi s=a 0.5 indicates a random response. 
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Fig. 14. Coefficients Si and Z\ of both participants I in all 24 two-person route choice games. The 
values of the S'-cocfficients (i.e. the individual tendencies towards direct or contrarian responses) 
are not very significant for the establishment of persistent cooperation, while large enough values 
of the Z-cocfficicnt stand for the emergence of oscillatory cooperation. 



Figure 14 shows the result of the 2-person route choice experiments (cooperation 
or not) as a function of S\ and S%, and as a function of Z\ and Z2. Moreover, 
Figure 15 displays the result as a function of the average strategy coefficients 

1 N 

Z =N^ Zl (18) 
1=1 

and 

1 N 

S =nY, S i- (19) 
1=1 

Our experimental data indicate that the ZJ-cocfficient is a good indicator for the 
establishment of cooperation, while the ^-coefficient seems to be rather insignificant 
(which also applies to the Yule coefficient). 
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Fig. 15. S- and Z-coefficients averaged over both participants in all 24 two-person route choice 
games. The mainly small, but positive values of S indicate a slight tendency towards direct re- 
sponses. However, the S'-coefficient is barely significant for the emergence of persistent oscillations. 
A good indicator for their establishment is a sufficiently large Z-value. 
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5. Multi-Agent Simulation Model 

In a first attempt, we have tried to reproduce the observed behavior in our 2- 
person route choice experiments by game-dynamical equations [28] . We have applied 
these to the 2x2 route choice game and its corresponding two-, three- and four- 
stage higher-order games (see Sec. 4.1). Instead of describing patterns of alternating 
cooperation, however, the game dynamical equations predicted a preference for the 
dominant strategy of the one-shot game, i.e. a tendency towards choosing route 1. 

The reason for this becomes understandable through Fig. 8. Selecting routes 
2 and 1 in an alternating way is not a stable strategy, as the other player can 
get a higher payoff by choosing two times route 1 rather than responding with 1 
and 2. Selecting route 1 all the time even guarantees that the own payoff is never 
below the one by the other player. However, when both players select route 1 and 
establish the related user equilibrium, no player can improve his or her payoff in 
the next iteration by changing the decision. Nevertheless, it is possible to improve 
the long-term outcome, if both players change their decisions, and if they do it in 
a coordinated way. Note, however, that a strict alternating behavior of period 2 
is an optimal strategy only in infinitely repeated games, while it is unstable to 
perturbations in finite games. 

It is known that cooperative behavior may be explained by a "shadow of the 
future" [2,3], but it can also be established by a "shadow of the past" [40], i.e. 
experience-based learning. This will be the approach of the multi-agent simula- 
tion model proposed in this section. As indicated before, the emergence of phase- 
coordinated strategic alternation (rather than a statistically independent applica- 
tion of mixed strategies) requires an almost deterministic behavior (see Fig. 16). 
Nevertheless, some weak stochasticity is needed for the establishment of asymmetric 
cooperation, both for the exploration of innovative strategies and for phase coor- 
dination. Therefore, we propose the following reinforcement learning model, which 
could be called a generalized win-stay, lose-shift strategy [50,54]. 
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Fig. 16. Representative example for a 2-person route choice simulation based on our proposed 
multi-agent reinforcement learning model with P™ ax = 100 and P™ ln = —200. The parameter uf 
has been set to 0.25. The other model parameters are specified in the text. Top left: Decisions 
of both agents over 300 iterations. Bottom left: Number N\(t) of 1-dccisions over time t. Right: 
Cumulative payoff of both agents as a function of the number of iterations. The emergence of 
oscillatory cooperation is comparable with the experimental data displayed in Fig. 4. 
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Let us presuppose that an individual approximately memorizes or has a good 
feeling of how well he or she has performed on average in the last ni iterations and 
since he or she has last responded with decision j to the situation (i,N\). In our 
success- and history-dependent model of individual decision behavior, pi(j\i,Ni;t) 
denotes agent Us conditional probability of taking decision j at time t + when i 
was selected at time t and N\(i) agents had chosen alternative 1. Assuming that pi 
is either or 1, pi(j\i, Ni; t) has the meaning of a deterministic response strategy: 
Pi(j\i, Ni;t) = 1 implies that individual I will respond at time t+1 with the decision 
j to the situation (i, Ni) at time t. 



Our reinforcement learning strategy can be formulated as follows: The response 
strategy pi(j\i, Ni, t) is switched with probability qi > 0, if the average individual 
payoff since the last comparable situation with i(t') — i(t) and N\(t') = N\(t) 
at time t' < t is less than the average individual payoff Pi{t) during the last ni 
iterations. In other words, if the time-dependent aspiration level Pi(t) [40,54] is not 
reached by the agent's average payoff since his or her last comparable decision, the 
individual is assumed to substitute the response strategy pi(j\i,Ni;t) by 



with probability qi . The replacement of dissatisfactory strategies orients at historical 
long-term profits (namely, during the time period [f',t]). Thereby, it avoids short- 
sighted changes after temporary losses. Moreover, it does not assume a comparison 
of the performance of the actually applied strategy with hypothetical ones as in most 
evolutionary models. A readiness for altruistic decisions is also not required, while 
exploratory behavior ("trial and error") is necessary. In order to reflect this, the 
decision behavior is randomly switched from pi(j\i, N\; t + 1) to l—pi(j\i,Ni;t + l) 
with probability 



Herein, P™ m and P^ ax denote the minimum and maximum average payoff of all N 
agents (simulated players). The parameter vf reflects the mutation frequency for 
Pl(t) — -P™ 111 , while the mutation frequency is assumed to be vf < v\ when the 
time-averaged payoff Pi reaches the system optimum P av . 

In our simulations, no emergent cooperation is found for vf = v\ = 0. v { { > 
or odd values of n; may produce intermittent breakdowns of cooperation. A small, 
but finite value of v\ is important to find a transition to persistent cooperation. 
Therefore, we have used the parameter value v\ — 0.25, while the simplest possible 
specification has been chosen for the other parameters, namely = 0, qi = 1, and 
ni = 2. 

The initial conditions for the simulation of the route choice game were specified 
in accordance with the dominant strategy of the one-shot game, i.e. 0) = 1 (ev- 
eryone tends to choose the freeway initially), p;(2|l, Ni; 0) = (it is not attractive 
to change from the freeway to the side road) and p/(l|2, Ni; 0) = 1 (it is tempting 



\;t+l) = l-p l (j\i,N 1 ;t) 



(20) 




(21) 
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Iteration t Iteration t 



Fig. 17. Left: Conditional changing probability p;(2|l, Ni = l;t) of agent I from route 1 (the 
"freeway") to route 2, when the other agent has chosen route 2, averaged over a time window of 50 
iterations. The transition from small values to 1 for the computer simulation displayed in Fig. 16 
is characteristic and illustrates the learning of cooperative behavior. Right: Proportion P;(l,t) of 
1-dccisions of both participants I in the two-person route choice experiment displayed in Fig. 16. 
While the initial proportion is often close to 1 (the user equilibrium), it reaches the value 1/2 when 
persistent oscillatory cooperation (the system optimum) is established. The simulation results are 
compatible with the essential features of the experimental data (see, for example, Fig. 10). 

to change from the side road to the freeway). Interestingly enough, agents learnt 
to acquire the response strategy p;(2|l, Ni = l;i) = 1 in the course of time, which 
established oscillatory cooperation with higher profits (see Figs. 16 and 17). 

Note that the above described reinforcement learning model [40] responds only 
to the own previous experience [13]. Despite its simplicity (e.g. the neglection of 
more powerful, but probably less realistic fc-move memories [11]), our "multi-agent" 
simulations reproduce the emergence of asymmetric reciprocity of two or more 
players, if an oscillatory strategy of period 2 can establish the system optimum. 
This raises the question why previous experiments of the TV-person route choice 




Expected Cooperative Episodes Required Cooperative Episodes n 



Fig. 18. Left: Comparison of the required number of cooperative episodes with the expected number 
of cooperative episodes in our multi-agent simulation of decisions in the route choice game. Note 
that the data points support formula (14). Right: Cumulative distribution of required cooperative 
episodes until persistent cooperation is established in our 2-person route choice simulations, using 
the simplest specification of model parameters (not calibrated). The simulation data are well 
approximated by the logistic curve (9) with the fit parameters C2 = 7.9 and 0I2 = 0.41. 
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game [27,63] have observed a clear tendency towards the Wardrop equilibrium [71] 
with P\(Ni) = P2(N 2 ) rather than phase-coordinated oscillations? It turns out 
that the payoff values must be suitably chosen [see Eq. (8)] and that several hun- 
dred repetitions are needed. In fact, the expected time interval T until a cooperative 
episode among N — Ni + N 2 participants occurs in our simulations by chance is well 
described by formula (14), see Fig. 18. The empirically observed transition in the 
decision behavior displayed in Fig. 10 is qualitatively reproduced by our computer 
simulations as well (see Fig. 17). The same applies to the frequency distribution of 
the average payoff values (compare Fig. 19 with Fig. 6) or to the number of expected 
and required cooperative episodes (compare Fig. 18 with Figs. 9 and 12). 
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Fig. 19. Frequency distributions of the average payoffs in our computer simulations of the 2-person 
route choice game. Left: Distribution during the first 50 iterations. Right: Distribution between 
iterations 250 and 300. Our simulation results are compatible with the experimental data displayed 
in Fig. 6. 



5.1. Simultaneous and alternating cooperation in the Prisoner's 
Dilemma 

Let us finally simulate the dynamic behavior in the two different variants of the 
Prisoner's Dilemma indicated in Fig. 3b, c with the above experience-based rein- 
forcement learning model. Again, we will assume P\\ = and P22 = —200. Ac- 
cording to Eq. (8), a simultaneous, symmetrical form of cooperation is expected 
for P12 = —300 and P21 = 100, while an alternating, asymmetric cooperation is 
expected for P12 = —300 and P21 = 500. Figure 20 shows simulation results for the 
two different cases of the Prisoner's Dilemma and confirms the two predicted forms 
of cooperation. Again, we varied only the parameter u\ , while we chose the sim- 
plest possible specification of the other parameters = 0, qi = 1, and ni —2. The 
initial conditions were specified in accordance with the expected non-cooperative 
outcome of the one-shot game, i.e. Pj(l, 0) = (everyone defects in the beginning), 
P/(2|2, N\; 0) = (it is tempting to continue defecting), p;(l|l, Ni = 1; 0) = (it is 
unfavourable to be the only cooperative player), and and p;(l|l, iVi = 2; 0) = 1 (it is 
good to continue cooperating, if the other player cooperates) . In the course of time, 
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agents learn to acquire the response strategy pi(2\2,Ni = 0;t) = when simulta- 
neous cooperation evolves, but pi(2\2,Ni = l;t) = when alternating cooperation 
is established. 
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Fig. 20. Representative examples for computer simulations of the two different forms of the Pris- 
oner's Dilemma specified in Fig. 3b, c. The parameter has been set to 0.25, while the other 
model parameters are specified in the text. Top: Emergence of simultaneous, symmetrical coopera- 
tion, where decision 2 corresponds to defection and decision 1 to cooperation. The system optimum 
corresponds to P^ ax = payoff points, and the minimum payoff to -P^ m = —200. Bottom: Emer- 
gence of alternating, asymmetric cooperation with P™ ax = 100 and P™ m = —200. Left: Time 
series of the agents' decisions and the number N\ (t) of 1-decisions. Right: Cumulative payoffs as 
a function of time t. 



6. Summary, Discussion, and Outlook 

In this paper, we have investigated the iV-person day-to-day route-choice game. 
This special congestion game has not been thoroughly studied before in the case 
of small groups, where the system optimum can considerably differ from the user 
equilibrium. The 2-person route choice game gives a meaning to a previously uncom- 
mon repeated symmetrical 2x2 game and shows a transition from the dominating 
strategy of the one-shot game to coherent oscillations, if P\2 + P21 > 2P\\. How- 
ever, a detailed analysis of laboratory experiments with humans reveals that the 
establishment of this phase-coordinated alternating reciprocity, which is expected 
to occur in other 2x2 games as well, is quite complex. It needs either strategic ex- 
perience or the invention of a suitable strategy. Such an innovation is driven by 
the potential gains in the average payoffs of all participants and seems to be based 
on exploratory trial-and-error behavior. If the changing frequency of one or sev- 
eral players is too low, no cooperation is established in a long time. Moreover, the 
emergence of cooperation requires certain kinds of strategies, which can be char- 
acterized by the Z-cocfficicnt (18). These strategies can be acquired by means of 
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reinforcement learning, i.e. by keeping response patterns which have turned out to 
be better than average, while worse response patterns are being replaced. The pun- 
ishment of uncooperative behavior can help to enforce cooperation. Note, however, 
that punishment in groups of TV > 2 persons is difficult, as it is hard to target the 
uncooperative person, and punishment hits everyone. Nevertheless, computer sim- 
ulations and additional experiments indicate that oscillatory cooperation can still 
emerge in route choice games with more than 2 players after a long time period 
(rarely within 300 iterations) (see Fig. 21). 




50 100 150 200 250 50 100 150 200 250 

Iteration t Iteration t 



Fig. 21. Emergence of phase-coordinated oscillatory behavior in the 4-person route choice game 
with the parameters specified in Fig. 13. Left: Experimental data of the decisions of 4 unexperienced 
participants over 300 iterations. Right: Computer simulation with the reinforcmcnt learning model. 



Altogether, spontaneous cooperation takes a long time. It is, therefore, sensitive 
to changing conditions reflected by time-dependent payoff parameters. As a con- 
sequence, emergent cooperation is unlikely to appear in real traffic systems. This 
is the reason why the Wardrop equilibrium tends to occur. However, cooperation 
could be rapidly established by means of advanced traveller information systems 
(ATIS) [8,14,30,37,41,63,70,73], which would avoid the slow learning process de- 
scribed by Eq. (14). Moreover, while we do not recommend conventional congestion 
charges, a charge for unfair usage patterns would support the compliance with indi- 
vidual route choice recommendations. It would supplement the inefficient individual 
punishment mechanism. 

Different road pricing schemes have been proposed, each of which has its own 
advantages and disadvantages or side effects. Congestion charges, for example, could 
discourage to take congested routes, which is actually required to reach minimum 
average travel times. Conventional tolls and road pricing may reduce the trip fre- 
quency due to budget constraints, which potentially interferes with economic growth 
and fair chances for everyone's mobility. 

In order to activate capacity reserves, we therefore propose an automated route 
guidance system based on the following principles: After specification of their des- 
tination, drivers should get individual (and, on average, fair) route choice recom- 
mendations in agreement with the traffic situation and the route choice proportions 
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required to reach the system optimum. If an individual selects a faster route instead 
of the recommended route it should use, it will have to pay an amount proportional 
to the decrease in the overall inverse travel time compared to the system optimum. 
Moreover, drivers not in a hurry should be encouraged to take the slower route i by 
receiving the amount of money corresponding to the related increase in the overall 
inverse travel time. Altogether, such an ATIS could support the system optimum 
while allowing for some flexibility in route choice. Moreover, the fair usage pattern 
would be cost-neutral for everyone, i.e. traffic flows of potential economic relevance 
would not be suppressed by extra costs. 

In systems with many similar routing decisions, a Pareto optimum characterized 
by asymmetric alternating cooperation may emerge even spontaneously. This could 
help to enhance the routing in data networks [72] and generally to resolve Braess-like 
paradoxes in networks [17]. 

Finally, it cannot be emphasized enough that taking turns is a promising strat- 
egy to distribute scarce resources in a fair and optimal way. It could be applied 
to a huge number of real-life situations due to the relevance for many strategical 
conflicts, including Leader, the Battle of the Sexes, and variants of Route Choice, 
Deadlock, Chicken, and the Prisoner's Dilemma. The same applies to their TV-person 
generalizations, in particular social dilemmas [23,25,40]. It will also be interesting 
to find out whether and where metabolic pathways, biological supply networks, or 
information flows in neuronal and immune systems use alternating strategies to 
avoid the waisting of costly resources. 
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